Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

نویسندگان

چکیده

The policy-based reinforcement learning (RL) can be considered as maximization of its objective. However, due to the inherent non-concavity objective, policy gradient method a first-order stationary point (FOSP) cannot guar- antee maximal point. A FOSP minimal or even saddle point, which is undesirable for RL. It has found that if all points are strict, second-order station- ary (SOSP) exactly equivalent local maxima. Instead FOSP, we consider SOSP convergence criteria characterize sample complexity gradient. Our result shows converges an (ε, √εχ)-SOSP with probability at least 1 − O(δ) after total cost O(ε−9/2)sinificantly improves state art O(ε−9).Our analysis based on key idea decomposes parameter space Rp into three non-intersected regions: non-stationary region, and optimal then making improvement objective RL in each region. This technique potentially generalized extensive methods. For complete proof, please refer https://arxiv.org/pdf/2012.01491.pdf.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity of finding near-stationary points of convex functions stochastically

In the recent paper [3], it was shown that the stochastic subgradient method applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate O(k−1/4). In this supplementary note, we present a stochastic subgradient method for minimizing a convex function, with the improved rate Õ(k−1/2).

متن کامل

Convergence to Second Order Stationary Points in Inequality Constrained Optimization

We propose a new algorithm for the nonlinear inequality constrained minimization problem, and prove that it generates a sequence converging to points satisfying the KKT second order necessary conditions for optimality. The algorithm is a line search algorithm using directions of negative curvature and it can be viewed as a non trivial extension of corresponding known techniques from unconstrain...

متن کامل

asymptotic property of order statistics and sample quntile

چکیده: فرض کنید که تابعی از اپسیلون یک مجموع نامتناهی از احتمالات موزون مربوط به مجموع های جزئی براساس یک دنباله از متغیرهای تصادفی مستقل و همتوزیع باشد، و همچنین فرض کنید توابعی مانند g و h وجود دارند که هرگاه امید ریاضی توان دوم x متناهی و امیدریاضی x صفر باشد، در این صورت می توان حد حاصلضرب این توابع را بصورت تابعی از امید ریاضی توان دوم x نوشت. حالت عکس نیز برقرار است. همچنین ما با استفاده...

15 صفحه اول

Second Order Properties of Locally Stationary Processes

In this paper we investigate an optimal property of the maximum likelihood estimator of Gaussian locally stationary processes by the second order approximation. In the case where the model is correctly specified, it is shown that appropriate modifications of the maximum likelihood estimator for Gaussian locally stationary processes is second order asymptotically efficient. We discuss second ord...

متن کامل

Convergence to Second-Order Stationary Points of a Primal-Dual Algorithm Model for Nonlinear Programming

We define a primal-dual algorithm model (SOLA) for inequality constrained optimization problems that generates a sequence converging to points satisfying the second order necessary conditions for optimality. This property can be enforced by combining the equivalence between the original constrained problem and the unconstrained minimization of an exact augmented Lagrangian function and the use ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i12.17271